An empirical study of smoothing techniques for language modeling
نویسندگان
چکیده
منابع مشابه
An Empirical Study of Smoothing Techniques for Language Modeling
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first t ime how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative pe...
متن کاملAn Empirical Study of Smoothing Techniques for LanguageModelingStanley
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mer-cer (1980), Katz (1987), and Church and Gale (1991). We investigate for the rst time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) aaect the relative perfo...
متن کاملSmoothing Techniques for Tree-k-Grammar-Based Natural Language Modeling
In a previous work, a new probabilistic context-free grammar (PCFG) model for natural language parsing derived from a tree bank corpus has been introduced. The model estimates the probabilities according to a generalized k-grammar scheme for trees. It allows for faster parsing, decreases considerably the perplexity of the test samples and tends to give more structured and refined parses. Howeve...
متن کاملAn Empirical Study of Predictive Modeling Techniques of Software Quality
The primary goal of software quality engineering is to apply various techniques and processes to produce a high quality software product. One strategy is applying data mining techniques to software metrics and defect data collected during the software development process to identify the potential lowquality program modules. In this paper, we investigate the use of feature selection in the conte...
متن کاملLong Distance Dependency in Language Modeling: An Empirical Study
This paper presents an extensive empirical study on two language modeling techniques, linguistically-motivated word skipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to others that were proposed previously for the same purpose. We evaluate the resulting models on th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Speech & Language
سال: 1999
ISSN: 0885-2308
DOI: 10.1006/csla.1999.0128